Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create link to repo that installs MIMIC in a Vagrant VM #31

Merged
merged 1 commit into from
Jan 17, 2016

Conversation

nsh87
Copy link
Contributor

@nsh87 nsh87 commented Nov 20, 2015

I'm not sure how best to give this code back to the community (I'm up for alternative ways if this isn't the best). I created a repo that automates the steps given on the website here inside a VM: http://mimic.physionet.org/tutorials/install_mimic_locally/.

There are a lot of advantages to loading the MIMIC II data into a VM. For those that are new to databases, this should make it very easy to get started without mucking about with local databases. And for those who are familiar with software development, I'm sure they could appreciate getting the DB up and running with less work than usual.

@alistairewj
Copy link
Member

Very much welcome these sorts of contributions. How much effort would it take to update this for MIMIC-III? I think we're going to try to keep this repository focused on MIMIC-III code, just because of the huge differences between II and III.

@nsh87
Copy link
Contributor Author

nsh87 commented Nov 21, 2015

I haven't explore MIMIC III yet, but probably not too much effort. I can put something together and update this PR with a link to a repo that does the same for MIMIC III; or I might just update this same repo to do both MIMIC II and MIMIC III, with a prompt for which to use. Thoughts?

@alistairewj
Copy link
Member

I'd create a new repo for MIMIC-III exclusively as you say. There aren't that many changes - the scripts in this repository should be up to date and facilitate making the Vagrant file (though admittedly I haven't run through them on the latest v1.2 release).

We could also add a link to your current repo at the bottom of the README for users who would like to use MIMIC-II.

@nsh87
Copy link
Contributor Author

nsh87 commented Nov 21, 2015

Ok, sounds good. When it's ready to go I'll update the PR with a link to the new repo.

@alistairewj
Copy link
Member

Awesome, thanks!
On 21 Nov 2015 14:31, "Nikhil Haas" notifications@github.com wrote:

Ok, sounds good. When it's ready to go I'll update the PR with a link to
the new repo.


Reply to this email directly or view it on GitHub
#31 (comment).

@nsh87
Copy link
Contributor Author

nsh87 commented Dec 6, 2015

@alistairewj: haven't forgotten about this, just wrapping up a couple other projects in the next week or two, then will be able to get to doing this for MIMIC III.

@alistairewj
Copy link
Member

No problem - we have a new version with very slight changes coming up
(renamed columns etc) - might be worth it to wait for that!
On 5 Dec 2015 19:46, "Nikhil Haas" notifications@github.com wrote:

@alistairewj https://github.com/alistairewj: haven't forgotten about
this, just wrapping up a couple other projects in the next week or two,
then will be able to get to doing this for MIMIC III.


Reply to this email directly or view it on GitHub
#31 (comment).

@nsh87
Copy link
Contributor Author

nsh87 commented Jan 12, 2016

@alistairewj: Nearly done using version 3...just run vagrant up and all the magic happens automatically. However, one final thing I'm having an issue with: I'm validating that the data loads correctly and I used the table row counts on the site (e.g. ADMISSIONS table counts here). Some of my tables' row counts are different than what's provided there, despite the unzipped file checksums matching what they're supposed to be. Do you know if the counts there are current?

Running data validation script  Tue Jan 12 03:31:32 UTC 2016
ADMISSIONS  Expecting 58976 rows.   58976 found.
CALLOUT  Expecting 34499 rows.   34499 found.
CAREGIVERS  Expecting 8221 rows.    7567 found.  ** ERRORS FOUND. **
CHARTEVENTS  Expecting 257495071 rows.   263201375 found.  ** ERRORS FOUND. **
CPTEVENTS  Expecting 573146 rows.   573146 found.
DATETIMEEVENTS  Expecting 4486049 rows.   4486049 found.
D_CPT  Expecting 134 rows.     134 found.
DIAGNOSES_ICD  Expecting 651047 rows.   651047 found.
D_ICD_DIAGNOSES  Expecting 14567 rows.   14567 found.
D_ICD_PROCEDURES  Expecting 3882 rows.    3882 found.
D_ITEMS  Expecting 15492 rows.   12478 found.  ** ERRORS FOUND. **
D_LABITEMS  Expecting 755 rows.     755 found.
DRGCODES  Expecting 125557 rows.   125557 found.
ICUSTAYS  Expecting 61532 rows.   61532 found.
INPUTEVENTS_CV  Expecting 17528895 rows.   17528894 found.  ** ERRORS FOUND. **
INPUTEVENTS_MV  Expecting 3618992 rows.   3618991 found.  ** ERRORS FOUND. **
LABEVENTS  Expecting 27872575 rows.   27872575 found.
MICROBIOLOGYEVENTS  Expecting 328446 rows.   328446 found.
OUTPUTEVENTS  Expecting 4349340 rows.   4349339 found.  ** ERRORS FOUND. **
PATIENTS  Expecting 46520 rows.   46520 found.
PRESCRIPTIONS  Expecting 4156848 rows.   4156848 found.
PROCEDUREEVENTS_MV  Expecting 258066 rows.   258066 found.
PROCEDURES_ICD  Expecting 240095 rows.   240095 found.
SERVICES  Expecting 73344 rows.   73343 found.  ** ERRORS FOUND. **
TRANSFERS  Expecting 261897 rows.   261897 found.
Data validation complete  Tue Jan 12 03:37:24 UTC 2016

I'm tempted to replace the expected row counts in the validation script with my current counts to make the script work properly...because I assume everything loaded correctly since some of the tables' counts match the counts on the site, and the ones that differ aren't off by a significant number of rows. Except for maybe D_ITEMS.

Btw, ** ERRORS FOUND. ** just means the row count differs, nothing special there.

@pszolovits
Copy link
Contributor

In my MYSQL installation of MIMIC-III v1.3, I see the following, which agrees with Nikhil’s numbers:

mysql> DROP PROCEDURE IF EXISTS COUNT_ALL_RECORDS_BY_TABLE;
Query OK, 0 rows affected, 1 warning (0.03 sec)

mysql> DELIMITER $$
mysql> CREATE DEFINER=root@127.0.0.1 PROCEDURE COUNT_ALL_RECORDS_BY_TABLE()
-> BEGIN
-> DECLARE done INT DEFAULT 0;
-> DECLARE TNAME CHAR(255);
->
-> DECLARE table_names CURSOR for
-> SELECT table_name FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = DATABASE();
->
-> DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
->
-> OPEN table_names;
->
-> DROP TABLE IF EXISTS TCOUNTS;
-> CREATE TEMPORARY TABLE TCOUNTS
-> (
-> TABLE_NAME CHAR(255),
-> RECORD_COUNT INT
-> ) ENGINE = MEMORY;
->
->
-> WHILE done = 0 DO
->
-> FETCH NEXT FROM table_names INTO TNAME;
->
-> IF done = 0 THEN
-> SET @SQL_TXT = CONCAT("INSERT INTO TCOUNTS(SELECT '" , TNAME , "' AS TABLE_NAME, COUNT(*) AS RECORD_COUNT FROM ", TNAME, ")");
->
-> PREPARE stmt_name FROM @SQL_TXT;
-> EXECUTE stmt_name;
-> DEALLOCATE PREPARE stmt_name;
-> END IF;
->
-> END WHILE;
->
-> CLOSE table_names;
->
-> SELECT * FROM TCOUNTS;
->
-> SELECT SUM(RECORD_COUNT) AS TOTAL_DATABASE_RECORD_CT FROM TCOUNTS;
->
-> END
-> $$
Query OK, 0 rows affected (0.03 sec)

mysql> DELIMITER ;
mysql>
mysql> CALL COUNT_ALL_RECORDS_BY_TABLE();
+--------------------+--------------+
| TABLE_NAME | RECORD_COUNT |
+--------------------+--------------+
| admissions | 58976 |
| callout | 34499 |
| caregivers | 7567 |
| chartevents | 263201375 |
| cptevents | 573146 |
| d_cpt | 134 |
| d_icd_diagnoses | 14567 |
| d_icd_procedures | 3882 |
| d_items | 12478 |
| d_labitems | 755 |
| datetimeevents | 4486049 |
| diagnoses_icd | 651047 |
| drgcodes | 125557 |
| drglist | 58923 |
| elixhauser | 58890 |
| icustays | 61532 |
| inputevents_cv | 17528894 |
| inputevents_mv | 3618991 |
| labevents | 27872575 |
| microbiologyevents | 328446 |
| noteevents | 2078705 |
| outputevents | 4349339 |
| patients | 46520 |
| prescriptions | 4156848 |
| procedureevents_mv | 258066 |
| procedures_icd | 240095 |
| services | 73343 |
| transfers | 261897 |
+--------------------+--------------+
28 rows in set (3 min 36.74 sec)

+--------------------------+
| TOTAL_DATABASE_RECORD_CT |
+--------------------------+
| 330163096 |
+--------------------------+
1 row in set (3 min 36.74 sec)

Query OK, 0 rows affected (3 min 36.74 sec)

On Jan 11, 2016, at 7:45 PM, Nikhil Haas notifications@github.com wrote:

@alistairewj: Nearly done using version 3...just run vagrant up and all the magic happens automatically. However, one final thing I'm having an issue with: I'm validating that the data loads correctly and I used the table row counts on the site (e.g. ADMISSIONS table counts here). Some of my tables' row counts are different than what's provided there, despite the unzipped file checksums matching what they're supposed to be. Do you know if the counts there are current?

Running data validation script Tue Jan 12 03:31:32 UTC 2016
ADMISSIONS Expecting 58976 rows. 58976 found.
CALLOUT Expecting 34499 rows. 34499 found.
CAREGIVERS Expecting 8221 rows. 7567 found. ** ERRORS FOUND. **
CHARTEVENTS Expecting 257495071 rows. 263201375 found. ** ERRORS FOUND. **
CPTEVENTS Expecting 573146 rows. 573146 found.
DATETIMEEVENTS Expecting 4486049 rows. 4486049 found.
D_CPT Expecting 134 rows. 134 found.
DIAGNOSES_ICD Expecting 651047 rows. 651047 found.
D_ICD_DIAGNOSES Expecting 14567 rows. 14567 found.
D_ICD_PROCEDURES Expecting 3882 rows. 3882 found.
D_ITEMS Expecting 15492 rows. 12478 found. ** ERRORS FOUND. **
D_LABITEMS Expecting 755 rows. 755 found.
DRGCODES Expecting 125557 rows. 125557 found.
ICUSTAYS Expecting 61532 rows. 61532 found.
INPUTEVENTS_CV Expecting 17528895 rows. 17528894 found. ** ERRORS FOUND. **
INPUTEVENTS_MV Expecting 3618992 rows. 3618991 found. ** ERRORS FOUND. **
LABEVENTS Expecting 27872575 rows. 27872575 found.
MICROBIOLOGYEVENTS Expecting 328446 rows. 328446 found.
OUTPUTEVENTS Expecting 4349340 rows. 4349339 found. ** ERRORS FOUND. **
PATIENTS Expecting 46520 rows. 46520 found.
PRESCRIPTIONS Expecting 4156848 rows. 4156848 found.
PROCEDUREEVENTS_MV Expecting 258066 rows. 258066 found.
PROCEDURES_ICD Expecting 240095 rows. 240095 found.
SERVICES Expecting 73344 rows. 73343 found. ** ERRORS FOUND. **
TRANSFERS Expecting 261897 rows. 261897 found.
Data validation complete Tue Jan 12 03:37:24 UTC 2016


Reply to this email directly or view it on GitHub.

@tompollard
Copy link
Member

Thanks @nsh87 and @pszolovits. The testing script at: https://github.com/MIT-LCP/mimic-code/blob/master/tests/test_postgres_build.py includes the current row counts for each table:

# Create dictionary with table details for use in testing
row_dict = {
"ADMISSIONS": 58976,
"CALLOUT": 34499,
"CAREGIVERS": 7567,
"CHARTEVENTS": 263201375,
"CPTEVENTS": 573146,
"D_CPT": 134,
"D_ICD_DIAGNOSES": 14567,
"D_ICD_PROCEDURES": 3882,
"D_ITEMS": 12478,
"D_LABITEMS": 755,
"DATETIMEEVENTS": 4486049,
"DIAGNOSES_ICD": 651047,
"DRGCODES": 125557,
"ICUSTAYS": 61532,
"INPUTEVENTS_CV": 17528894,
"INPUTEVENTS_MV": 3618991,
"LABEVENTS": 27872575,
"MICROBIOLOGYEVENTS": 328446,
"NOTEEVENTS": 2078705,
"OUTPUTEVENTS": 4349339,
"PATIENTS": 46520,
"PRESCRIPTIONS": 4156848,
"PROCEDUREEVENTS_MV": 258066,
"PROCEDURES_ICD": 240095,
"SERVICES": 73343,
"TRANSFERS": 261897 }

So from a quick glance it looks like your row counts are correct and the website needs updating. We're hoping to generate the documentation directly from the database in future (perhaps using https://github.com/vokal/pg-table-markdown) which will help to ensure it stays up to date.

tompollard added a commit to MIT-LCP/mimic-website that referenced this pull request Jan 12, 2016
tompollard added a commit that referenced this pull request Jan 17, 2016
Create link to repo that installs MIMIC in a Vagrant VM
@tompollard tompollard merged commit 5d58519 into MIT-LCP:master Jan 17, 2016
@tompollard
Copy link
Member

Many thanks Nikhil, this looks great! I've not used the Git submodule functionality before but I'm wondering whether it might be useful here to link directly to your repo? Merging the readme for now...

@tompollard
Copy link
Member

@nsh87 Hmm, just realised I've merged the MIMIC-II version, rather than MIMIC-III. Please could you resubmit a pull request when the new version is ready?

@nsh87
Copy link
Contributor Author

nsh87 commented Jan 17, 2016

@tompollard, whoops, yes I can! It's done for MIMIC-III also now, using the new row counts, just wrapping up the documentation, most likely today. Thanks!

@tompollard
Copy link
Member

Great, thanks :) Sorry, I should have been paying closer attention!

@nsh87 nsh87 deleted the vagrant_vm_with_mimic branch January 19, 2016 02:30
alistairewj pushed a commit to MIT-LCP/mimic-website that referenced this pull request Jul 17, 2017
briangow pushed a commit that referenced this pull request Apr 27, 2021
Subquery alias for postgres
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants